The Journal of Parapsychology, Vol. 66, March 2002 (pp. 73-82) 


THE GANZFELD DEBATE CONTINUED: 
A RESPONSE TO MILTON AND 
WISEMAN (2001) 

By Lance Storm and Suitbert Ertel 


ABSTRACT: Most researchers in parapsychological circles and beyond are familiar with the 
ganzfeld debate, which was revived in a series of articles that appeared in Psychological 
Bulletin, This article is a response to J. Milton and R. Wiseman’s (2001) reply to L. Storm and 
S. Ertel (2001), who took issue with J. Milton and R. Wiseman’s (1999a) claim that the 
evidence for psi in the ganzfeld was not replicable. The authors (Storm & Ertel) argue that 
in their reply, J. Milton and R. Wiseman (2001) misrepresented the issues raised in R. 
Hyman and C. Honorton’s (1986) Joint Communique to their advantage. Milton and 
Wiseman wrongly took the standards of the Communique as implying low quality of all 
previous studies and downplayed the accumulated evidence that doubts about the 
credibility of pre-Communique ganzfeld researchers were unwarranted. They wrongfully 
belittled statistical significance, an important contributor to empirical evidence, and on 
mere circumstantial grounds, they ignored the necessity of the bidirectionality test, which is 
acknowledged as a unique psi indicator. The authors reassess the effect sizes for the various 
ganzfeld databases and conclude that Milton and Wiseman’s critique is essentially out of 
place. For future ganzfeld and psi research in general, the authors recommend a 
process-oriented strategy. 


The thrust of our critique of Milton and Wiseman’s (1999a) article is 
condensed in our (Storm 8c Ertel, 2001) article’s abstract: 

J. Milton and R. Wiseman (1999 [a]) attempted to replicate D. 
Bern and C. Honorton’s (1994) meta-analysis, which yielded evi- 
dence that the ganzfeld is a suitable method for demonstrating 
anomalous communication. Using a database of 30 ganzfeld and 
autoganzfeld studies, Milton and Wiseman’s meta-analysis 
yielded an effect size ES of only 0.013 (Stouffer Z= 0.70, p = .24, 
one-tailed). Thus they failed to replicate Bern and Honorton’s 
finding ( ES= 0.162, Stouffer Z= 2.52, p= 5.90 x 10' 3 , one-tailed). 

The authors [Storm 8c Ertel] conducted stepwise performance 
comparisons between all available databases of ganzfeld re- 
search. Larger aggregates of such studies were formed, including 
a database comprising 79 ganzfeld/autoganzfeld studies (ES = 
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0.138, Stouffer Z = 5.66, p = 7.78 x 10 -9 ). Thus Bern and 
Honorton’s positive conclusion was confirmed .... The ganzfeld 
appears to be a replicable technique for producing psi effects in 
the laboratory, (p. 424) 

By way of a reply to our (Storm & Ertel, 2001 ) article, Milton and Wise- 
man (2001) criticized some of our assumptions and procedures. Milton 
and Wiseman’s (2001 ) response was as follows (taken from their abstract) : 

[Storm and Ertel] ignored the well-documented and widely 
recognised methodological problems in the early studies, which 
make it impossible to interpret the results as evidence of extra- 
sensory perception. In addition, Storm and Ertel’s meta-analysis 
is not an accurate quantitative summary of ganzfeld research be- 
cause of methodological problems such as their use of an incon- 
sistent method for calculating study outcomes and inconsistent 
inclusion criterion, (p. 434) 

We address Milton and Wiseman’s (2001) criticisms below in the or- 
der that they appeared in their reply, which is not always in the order of 
importance. 

Are Quotes Taken Out of Context Admissible As Evidence? 

We criticize Milton and Wiseman (2001) for selectively picking quotes 
that feature “psi-questioning” content while ignoring “psi-supporting” ac- 
counts. That is, they started off by (a) strategically quoting, at length, skeptic 
Ray Hyman (see Milton & Wiseman, 2001, p. 434), (b) failing to represent 
Honorton’s (1985) and Hyman and Honorton’s (1986) positive views, and 
(c) using a misleading passage from Hyman and Honorton (see Milton 8c 
Wiseman, 2001, p. 435). Milton and Wiseman thus quoted Hyman and 
Honorton (1986) out of context (Milton & Wiseman, 2001, p. 435) and 
misrepresented Hyman and Honorton’s (1986) joint overall conclusion, 
which was: “we agree that the overall significance observed in these studies 
cannot reasonably be explained by these selective factors [i.e., “multiple 
testing, retrospective experiments, . . . the file-drawer problem,” etc.]” (p. 
352). Two years later, after further testing, Harris and Rosenthal (1988b) 
reiterated this conclusion: “Our analysis of the effects of flaws on study out- 
come lends no support to the hypothesis that ganzfeld research results are 
a significant function of the set of flaw variables” (p. 3) . 

However, Milton and Wiseman (2001, p. 435) placed greater credence 
on the following statement from Hyman and Honorton (1986): “the final 
verdict awaits the outcome of future experiments — ones conducted by a 
broader range of investigators and according to more stringent standards” 
(p. 353) . The fact is that Hyman and Honorton also “agreed that the signif- 
icant outcomes have been produced by a number of different investiga- 
tors” (p. 352) and that the argument over “stringent standards” was largely 
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rhetorical (p. 353) . (Note that we do not object to Milton and Wiseman’s 
appeal to desirable future research for further evidence, but they make 
their point as if past research had been inconclusive.) 

Thus, Milton and Wiseman (2001) clouded the waters and misled 
the unsuspecting reader into thinking that the statistically significant re- 
sult of Honorton’s (1985) database taken at face value contributed little, if 
anything, to the evidence for psi because the methodological issues (see 
Milton 8c Wiseman, 2001, pp. 434-435) were of greater concern. Yet, it is 
common knowledge that significance testing, aside from assessments of 
effect size, is an indispensable way of finding out whether experimental 
effects should be regarded as existent. The undoubtedly justified de- 
mand for replication, within and between investigators, cannot replace 
the equally important demand for statistical confidence of independent 
studies. Even if psi would seem to entirely disappear, like an ice-age cli- 
mate in earth history, previous significant observations — ice formations 
as in our analogy — would not become invalid. 

Even Hyman and Honorton (1986), it seems, disregarded this logic 
to some extent when they made the distinction between significant ef- 
fects, on the one hand, and evidence for psi (i.e., a communications 
anomaly) , on the other. Yet at a very early stage, Rosenthal (1986) insisted 
that the accumulated evidence should not be neglected: “At any point in 
time some judgment can be made .... We feel it would be implausible to 
entertain the null given the combined p from these 28 studies” (p. 333). 
Paraphrasing Rosenthal, our judgment is that psi effects have been evi- 
denced by significant results so that we may rightfully defend our (Storm 
& Ertel, 2001, p. 424) quotes taken from Hyman and Honorton (1986). 

Now, it seems, Hyman’s personal communication to Milton and Wise- 
man (as of September 28, 2000, cited in Milton & Wiseman, 2001, p. 435) 
finds us guilty of a faulty interpretation of his original intent when he said 
that: “the ganzfeld data base had too many problems to be considered as 
evidence for the existence of psi.” But this is Hyman’s personal interpreta- 
tion, and he cannot speak on Honorton’s behalf. Hyman and Honorton 
(1986) actually disagreed over the “degree . . . [of] evidence for psi” (p. 
352) , and the two authors differed “about the extent and seriousness of 
[the] departures” from “ideal standards” for the ganzfeld (p. 352) . Appar- 
ently, two stories are being told in the “Joint Communique.” As stated 
above, Hyman and Honorton (1986) agreed that “significant outcomes 
have been produced by a number of different investigators” (p. 352). But 
then they seriously weakened this conclusion by saying: “If a variety of para- 
psychologists and other investigators continue to obtain significant re- 
sults ... a genuine communications anomaly will have been demon- 
strated” (p. 354). This reluctance (indicated by a conditional “if’) to 
accept existing significant replications as grounds to remove uncertainty 
might be due to Honorton’s, the psi-proponent’s, probable dilemma: He 
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was forced to accept some of skeptic Hyman’s overskeptical formulations 
to get the “joint” Communique completed. 


Did Honorton’s (1985) Ganzfeld Database Have Flaws? 

Milton and Wiseman (2001) contended that we “denied that there 
were problems in the early gan 2 feld studies” and that our denial was made 
“in the face of so much documented evidence to the contrary” (p. 435). 
Did we really deny problems? Specifically, did we deny the fact that earlier 
methods had less controls or that actual controls were made less explicit? 
In fact, we did refer to those problems by stating that “claims of alleged 
flaws in Honorton’s, 1985, ganzfeld studies, and his meta-analyses, have 
not been successfully defended” (Storm 8c Ertel, 2001, p. 426), and, prior 
to that statement, we backed up our argument by saying: “Numerous 
claims that flaws in Honorton’s (1985) meta-analysis still exist have been 
debunked” (Storm 8c Ertel, 2001 , p. 424) , and we gave references to that ef- 
fect — specifically, Atkinson, Atkinson, Smith, and Bern (1990), Harris and 
Rosenthal (1988a, 1988b), Saunders, (1985), and Utts (1991). 

In other words, we fully acknowledged those “problems” (hypothesized 
artifacts) , we did not deny them, we merely said they were solved. Milton and 
Wiseman’s opposition is not based on facts, but on mere doubts over clean 
methods, which were brought forward in three earlier papers: Hyman 
(1985), Honorton (1985), and Hyman 8c Honorton (1986). Milton and 
Wiseman (2001) turn guesswork into “evidence” (“so much documented ev- 
idence to the contrary,” p. 435) thus making a mountain out of a molehill, 
which no longer exists, while actually ignoring our list of five articles (just 
mentioned above) dating from 1985 to 1991. These up-to-date articles dis- 
prove Milton and Wiseman’s conjectures (Storm & Ertel, 2001, p. 424). 


Is Quality Rating of Earlier Studies Always Necessary? 

Milton and Wiseman (2001) argued that the 11 pre-Communique 
studies used in our meta-analysis should not have been used at all because 
of ostensible “methodological weaknesses” (p. 435). To defend this argu- 
ment, they referred to the Communique, not as “a mere documentation 
of traditional and uncontroversial research rules,” as they should, but in 
order to “justify downgrading the quality of all research published before 
1986,” as already noted in Storm and Ertel (2001, p. 425) . The Communi- 
que was not written as an indictment of prior ganzfeld research, and it 
should never be used as such. 

In our own meta-analysis of 1 1 “newly found” studies, we endeavored 
to maintain Honorton’s (reappraised and approved) standard in our 
search. Thus, we regard Milton and Wiseman’s (2001) denouncing our 
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practice by attributing “no value in performing such a meta-analysis” (p. 
435) as not based on any factual arguments. 

Milton and Wiseman (2001, p. 436) then criticized us for restricting 
our quality ratings to those 1 1 studies, without rating “the other 68 stud- 
ies” (i.e., Honorton’s 28 studies, plus Milton and Wiseman’s 30 studies, 
plus Honorton’s 10 studies). We undertook a quality assessment of those 
11 studies because they had not yet been subjected to the rigors of 
Hymanian techniques of invalidation (as was done to Honorton’s data- 
base) . We also pointed out that the method of quality rating was not en- 
tirely our own, as Milton and Wiseman (2001, p. 436) falsely assumed, but 
was modeled on Radin and Ferrari’s (1991, pp. 65-66) procedure. 

We did not conduct a similar quality rating on Honorton’s database 
because that database of direct-hit studies had already won acceptance 
from discriminating parapsychologists — it was actually never invalidated, 
neither by Hyman’s nor any other researcher’s analysis (as listed above). It 
should also be clear that quality rating of the post-Communique studies 
would be redundant — Milton and Wiseman themselves regarded them as 
flawless by their own standards. As for the inclusion of direct-hit studies 
only, that criterion was already explained (see Storm & Ertel, 2001, p. 427) . 

Effect size is another clarifying issue. We looked at effect sizes of the two 
databases (“pre-Communique” and “post-Communique”) in a number of 
different ways (Storm & Ertel, 2001, pp. 427-429) and found that they did 
not differ significantly. We conducted performance comparisons of (a) 
pre-Communique studies with post-Communique studies and (b) pre- 
Communique authors with post-Communique authors, both of which 
yielded no statistical evidence that the guidelines in the Communique had 
any “influence on effect size outcomes” (p. 430) or any influence on princi- 
pal authors. There was no indication that the mean effect size of the 
pre-Communique database was “inflated” (i.e., an artifact of flaws) because it 
compares favorably with the allegedly “flawless” post-Communique studies. 
And there was no evidence that the mean effect size of the post-Communique 
database was “deflated” because of the removal of purported flaws. 

Apropos to our findings, we refer to Palmer (1986), who warned that 
false conclusions can be drawn on account of, and by appeal to, the Commu- 
nique’s guidelines — it should not be assumed that “past successes were due 
to the presence of the flaws” (p. 379). Thus we provided new evidence sup- 
porting our position that earlier studies do not show any effects of hypothe- 
sized methodological shortcomings. Milton and Wiseman ignored this evi- 
dence altogether. 

Incidentally, an apparently perplexing contradiction escaped the no- 
tice of our two critics in their demand for more extensive quality ratings. 
They claimed there are “obstacles to using quality scales to detect and 
correct for methodological problems in studies” (p. 437) and referred 
extensively to some (irrelevant) medical study. How can Milton and Wise- 
man claim that pre-Communique studies had methodological problems 
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(and must therefore be ignored) while claiming, a little while later, that 
the methods used to find that out are doubtful in the first place? For Mil- 
ton and Wiseman, the null hypothesis of psi has never been and, appar- 
ently, will never be rejected — because there are “problems” — even if the 
claim of existing problems turns out to be itself unverifiable, in their view, 
because of further “problems.” 

Furthermore, Milton and Wiseman (2001 ) generally ignored important 
statements about quality assessment in other meta-analyses (except their 
own). For example, Lawrence (1993) said: “Neither the quality of studies 
nor their effect sizes, has significantly changed over the years” (p. 75). Mil- 
ton ignored even her own coauthored finding of a meta-analytic quality as- 
sessment of forced-choice psi experiments: “There were no statistically sig- 
nificant correlations between the presence of procedural safeguards and 
effect size and hence no suggestion that methodological problems had 
played any strong and obvious role in the overall effects . . . although the 
small database would be expected to provide relatively low statistical power 
for detecting any such effects” (Steinkamp, Milton, & Morris, 1998, p. 193). 
(The sample consisted of 22 study pairs, i.e., of 44 studies!) Thus, if there re- 
ally were such effects and if they were not revealed with 44 studies, the size of 
such effects must have been negligible. 

Another indication of Milton and Wiseman’s tendency to downplay 
psi comes in the form of their conclusion after conducting a meta-analysis 
of psi research via mass media channels (Milton 8c Wiseman, 1999b) . The 
results did not support the psi hypothesis, so they deemed it possible that 
“ESP does not exist and that the mass-media studies accurately estimate its 
effect size as indistinguishable from zero. In this scenario, the positive re- 
sults of the apparently successful meta-analyses would be due to method- 
ological flaws” (Milton 8c Wiseman, 1999b, p. 237). Milton and Wiseman 
apparently did not consider the conflict of their “scenario” with the bulk of 
positive results accumulated over decades of parapsychological experi- 
mentation and, above all, the entire absence of empirical evidence for 
“methodological flaws.” It is lack of such empirical evidence that has been 
accumulating. Milton and Wiseman took the liberty to ignore it. 

Milton even downplayed “a highly significant cumulative effect 
(Stouffer Z = 5.72)” and an appreciable mean ES of 0.16 in her meta- 
analysis offree-response ESP studies (Milton, 1997, p. 279) by pointing out 
possibilities of artifacts (data analyses were possibly not preplanned, or au- 
thors failed to report whether their analyses were preplanned) . However, 
she did not provide any empirical indication that studies without such re- 
ports gave rise to the suspicion that the authors of her sample were meth- 
odologically less sophisticated than her. Note also that Milton and Wise- 
man did not consider at all the possible lack of psi-conducive conditions 
(“flaws”) in mass media studies. These were merely characterized as hav- 
ing very “different” conditions; they do not characterize them as “proba- 
bly unfavorable” conditions. 
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Does a Revised Effect Size Calculation Make Any Difference? 

Milton and Wiseman (2001, p. 436) took issue with the lack of conser- 
vative calculations of some z scores for studies in the 1 1-study database. In 
fact, only 3 of the 11 studies needed adjustment, thus reducing the qual- 
ity-weighted mean zscore from .32 (ES= .14; Stouffer Z- 1.06 ,p= .144) to 
.26 ( ES= .13; Stouffer Z= 0.87, p= .192). The 11-study database is still not 
significantly different from Honorton’s (1985) 28-study database, t( 37) = 
.61 ,p - .543, two-tailed. Thus, the old ganzfeld database can still be formed. 
It has a mean z of .97 ( ES = .225; Stouffer Z = 6.05, p = 7.24 x 10' 10 ) , results 
of which are comparable with Storm and Ertel’s (2001, p. 429) original 
data for the old ganzfeld database: mean z of .99 (ES = .227; Stouffer Z= 
6.15, />= 3.93 x 10’ 10 ) . 

The “old” and the “new” ganzfeld databases are significantly differ- 
ent, t(7l) = 3.04 ,p = .003, co 2 = .09, but the omega-squared value (9%) is 
now exactly that of the critical value stipulated in Storm and Ertel’s 
(2001) paper. But Cohen’s (1988) test, as originally applied by Storm and 
Ertel, shows that the difference is again not significant. When the two da- 
tabases are combined, the 79-study database has a mean z score only 
slightly reduced from .64 to .63 (ES = .14; Stouffer Z = 5.59, p = 1.14 x 
10' 8 ). This “revised” larger database, representing once again a unified 
ganzfeld domain, might indicate that over two decades of ganzfeld/auto- 
ganzfeld work, again dismissed by Milton and Wiseman (2001) almost 
out of hand, has in fact not been in vain. 

Is Testing for Bidirectional Psi Not Legitimate Procedure? 

We performed tests for bidirectional psi (Storm 8c Ertel, 2001, p. 429), 
and results were positive throughout. Bidirectional effects appeared in all 
four databases indicating that, had Milton and Wiseman proposed a 
bidirectional hypothesis, their results would have supported their replica- 
tion trial even under their own unfavorable condition of limited data 
selection. 

Milton and Wiseman’s (2001, p. 436) reasons for disregarding bidirec- 
tionality are unacceptable. They referred to the fact that this form of anal- 
ysis played no role in Hyman and Honorton’s (1986) study or previous 
studies, and that “interest in testing for extreme dispersion” (p. 436) is a 
phenomenon that appeared only after their (Milton 8c Wiseman, 1999a) 
initial study. As it happens, bidirectionality has been regarded as a unique 
psi feature for five decades (Rao, 1965; Rhine, 1952). Thus we did not 
suggest a new approach to psi testing. Once attention is drawn to some 
relevant testing procedure, even if forgotten by most researchers, it must 
still be regarded as legitimate at any time. 

In a final effort to downplay our finding of a significant bidirectional 
effect, Milton and Wiseman (2001) regarded the probability level of p = 
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.027 for that effect in their 30-study database as “marginal” (p. 436) — they 
considered that it did not “carry much weight” (p. 436) because it was a 
“post hoc analysis” (p. 436) . Need we remind them that, by convention, 
chance explanations are rejected when p is less than or equal to .05 and 
that researchers are bound in that case to find explanations? 

We regard Milton and Wiseman’s wanton dismissal of the 5% rule, by 
labeling our decision “post hoc,” as theoretically and statistically ground- 
less. Not only were many of their own decisions made post hoc, but ignor- 
ing the 5% rule might be an act expected of skeptics in their burgeoning 
need for an ever-more “creative” and conditional interpretation of signifi- 
cant results whenever the need arises to undermine the evidence of an 
anomalous effect. 

How Much Evidence Is Needed to Convince Neutral Scientists? 

Milton and Wiseman (2001 ) reported the statistic that “only half’ of a 
limited number of respondents (members of the ganzfeld research com- 
munity) to an electronic mail forum (i.e., about 10 people!) “thought 
that the experimental evidence for psi was currently strong enough to 
convince a neutral scientist” (p. 437) . Need we say that the bottle is half 
full? The fact that half of it is still empty appears reasonable after consid- 
ering that respondents were not asked to indicate their own conclusions 
but were asked to guess conclusions by some “neutral,” that is, skeptical, 
but unprejudiced researcher, implying that the neutral researcher’s 
knowledge of the field was not broad-based. Those 10 respondents con- 
tributing to the “half-empty” kind of responses might have considered 
typical obstacles by “neutral” observers from a distance, while being con- 
vinced themselves — just as the other 10 respondents were — that the evi- 
dence was sufficient. Schmeidler and Edge (1999), whose article con- 
tained strong affirmative arguments, might nevertheless have replied to 
this questionnaire item that they doubt that “neutral” observers would re- 
gard the existing bulk of evidence as sufficient. 

Milton and Wiseman (2001, p. 437) also informed us that only 17% of 
all respondents (fewer than 4 people!) “thought that the procedures nec- 
essary for producing a reasonably replicable ganzfeld psi effect had as yet 
been identified” (p. 437) . But this common deficit of parapsychological re- 
search in general, which is due to our ignorance in this field of necessary 
causal agents cannot be reason to doubt psi phenomena. 

Recommended Strategy for Looking Into 
Determinants of the Ganzfeld Effect 

Milton and Wiseman’s recommendation for future research includes 
the establishment of “a possible strategy for attempting replication” of ganzfeld 
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effects (italics added; Milton 8c Wiseman, 2001, p. 437) . In our view, the real 
focus should be on process-oriented work (cf. E. May’s personal communi- 
cation, March 8 , 2001 , in which he stated: “Further ‘proof oriented work is a 
waste of resources” given the weight of evidence for psi). 

A movement toward identifying psi-conducive conditions must now be 
seen as more important than the ongoing debate over replication, which is 
often protracted by authors who, despite being well informed about the 
wealth of experimental findings in parapsychology, take pleasure in erod- 
ing away the psi findings by dubious means, apparently not unaware of im- 
mediate reinforcement from mainstream circles. We need to come closer 
to discovering elements of the true nature of ganzfeld phenomena, and psi 
in general. That shift in focus would be a movement away from unproduc- 
tive styles of 20 th-century research toward new horizons, it is hoped, of un- 
derstanding the reality of the paranormal. 
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